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Abstract — This paper proposes a novel algorithm for finding 
error-locators of algebraic-geometric codes that can eliminate 
the division-calculations of finite fields from the Berlekamp- 
Massey-Sakata algorithm. This inverse-free algorithm provides 
full performance in correcting a certain class of errors, generic 
errors, which includes most errors, and can decode codes on alge- 
braic curves without the determination of unknown syndromes. 
Moreover, we propose three different kinds of architectures 
that our algorithm can be applied to, and we represent the 
control operation of shift-registers and switches at each clock- 
timing with numerical simulations. We estimate the performance 
in comparison of the total running time and the numbers of 
multipliers and shift-registers in three architectures with those 
of the conventional ones for codes on algebraic curves. 

Index Tenns — codes on algebraic curves, syndrome decod- 
ing, Berlekamp-Massey-Sakata algorithm, Grobner basis, linear 
feedback shift-register. 



I. Introduction 

ALGEBRAIC-GEOMETRIC (AG) codes, especially codes 
on algebraic curves, are comprehensive generalization of 
prevailing Reed-Solomon (RS) codes. They can be applied to 
various systems by choosing suitable algebraic curves without 
any extension to huge finite (Galois) fields. In fast decoding 
of such codes, Berlekamp-Massey-Sakata (BMS) algorithm 
||25 | is often used for finding the location of errors, and the 
evaluation of error-values is done by using outputs of BMS 
algorithm with O' Sullivan's formula f24l. 

RS codes have the features of high error-correcting capa- 
bility and less complexity for the implementation of encoder 
and decoder. On the other hand, codes on algebraic curves 
have the issues related to the size of decoders as well as the 
operating speed of decoders. In particular, we notice that RS- 
code decoders need no inverse-calculator of the finite field 
(no finite-field inverter). The extended Euclidean algorithm 
||30l for RS codes has no divisions, and this enables us to 
operate compactly and quickly in calculating error-locator 
and error-evaluator polynomials. One inverse computation 
requires thirteen multiplications in practical GF(2®) and needs 
enormous circuit scale. Thus, it is strongly expected that the 
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fast inverse-free algorithm for AG codes will be established, 
since division operations are inevitable on the original BMS 
algorithm. In addition, the decoder that has small circuit-size, 
such as the conventional RS decoder, is considered necessary. 

In this paper, we propose an inverse-free BMS algorithm, 
and give a whole proof of its adequacy. Moreover, we propose 
three kinds of small-sized architectures that generate error- 
locator polynomials for codes on algebraic curves. We then 
explain our architectures with model structures and numerical 
examples, and show the practical operation of proposed archi- 
tectures in terms of the control flow of registers and switches 
at each clock-timing. The performance is estimated on the total 
running time and the numbers of multipliers and shift-registers 
for all architectures. 

The divisions in the original BMS algorithm appear at the 
Berlekamp transform |[ll 

In+i ■■= In - {d,N/SN) gN (1) 

at each A^-loop in the algorithm, where /tv, g^, and d^r 
are called minimal polynomial, auxiliary polynomial, and 
discrepancy at N, respectively, runs over < N < B 
for sufficiently large B, and (5 at is equal to a certain previous 
(In- Then the inverse-free BMS algorithm consists of modified 
Berlekamp transforms of the form 

Jn+i '■= snJn ~ d]sgN, (2) 

where cn is equal to a certain previous o?Ar in this expres- 
sion. Thus the denominator (57v in ([B is converted into the 
multiplication of e^q in This version of inverse-free BMS 
algorithm can be proved in the comparable line of the original 
algorithm. However, there is a significant obstacle to apply 
this inverse-free algorithm to the decoders for AG codes; 
we have to mention the existence of unknown syndromes, 
namely, the lack of syndrome values to decode errors whose 
Hamming weights are less than or equal to even the basic 
[(c?G — 1)/2J, where do is the Goppa (designed) minimum 
distance. Feng and Rao's paper [3] originally proposed ma- 
jority logic scheme to determine unknown syndromes in the 
decoding up to [(dpR — 1)/2J, where c?fr is their designed 
minimum distance > c?g- In the sequel, Sakata et al. ||26l and 
independently Kotter |7| modified and applied Feng-Rao's 
method to their decoding algorithm. If the divisions of the 
finite field are removed from BMS algorithm, one cannot 
execute the determination of unknown syndromes because 
of breaking the generation of candidate values of unknown 
syndromes for majority voting. Unfortunately, the elimination 
of finite-field divisions seemed to be a difficult problem in 
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Fig. 1. Map of various error-locator architectures implementing BMS (or 
equivalent) algorithm for decoding codes on algebraic curves. 



this regard. For this reason, no inverse-free algorithm for AG 
codes has been proposed until now. 

In this research, we effectively overcome this difficulty. 
Namely, we decode such codes with the only known syndrome 
values from received code-words. So far the type and amount 
of errors that could be corrected if one does not determine 
unknown syndromes have not been clear; the well-known 
fact up to [((ic ^ 9 ^ 1)/2J in Peterson-type algorithm |'6'|, 
where g is the genus of underlying algebraic curve, is not 
available for our case of BMS algorithm. We confirm that 
a class of generic errors lfT2lll23l (independent errors in f5)) 
can be corrected up to [(dpR — a)/2j only with syndromes 
from received words, where a is the minimal pole order of 
underlying algebraic curve: a — 2 for elliptic curves over 
arbitrary finite fields and a = 16 for Hermitian curve over 
GF(2^). Furthermore, we successfully obtain the approximate 
ratio {q — l)/q of the generic errors to all errors in the 
application of Grobner-basis theory, where q is the number of 
elements in the finite field. It means that we can decode most 
of the errors without majority logic scheme and voting. Thus 
we can realize not only inverse-free error-locator architectures 
for AG codes but also avoiding comphcated procedure and 
transmission of voting data among parts of decoders. Our 
method is applicable to all former architectures, and is not 
a go-back to the past but a real solution to construct decoders 
with feasible circuit-scale. 

Recently, the BMS algorithm has become more important 
not only in decoding codes on algebraic curves but also in 
algebraic soft-decision decoding {E\ of RS codes. Sakata et al. 
||22J [28 1 applied the BMS algorithm to the polynomial interpo- 
lation in Sudan and Guruswami-Sudan algorithms ||4ll ll29l for 
RS codes and codes on algebraic curves. Lee and O' Sullivan 
L9J 1,10 1 applied the Grobner-basis theory of modules, which is 
related to the BMS algorithm, to soft-decision decoding of RS 
codes. Our method can be expected to help further structural 
analysis of these methods. 

The rest of this paper is organized as follows. In Section HH 
we prepare notations, and define codes on algebraic curves. 



In Section |llll we propose an inverse-free BMS algorithm, 
and state the main theorem for output of the algorithm. In 
the next three sections, we describe three types of small-scale 
error-locator architectures, i.e., inverse-free, serial, and serial 
inverse-free architectures; the mutual relations among them 
and past architectures are depicted in Fig. [T] In Section |IV] 
we describe the inverse-free architecture, and divide it into 
three subsections: Subsection IIV-AI is an overview. Subsection 
IIV-BI deals with the technique for avoiding the determination 
of unknown syndromes, and Subsection IIV-CI is numerical 
simulation. In Section [V] we describe the serial architecture 
using parallel BMS algorithm. In Section [Vll we describe the 
serial inverse-free architectures combined with the previous 
methods. In Section IVIII we estimate the total running time 
and the numbers of finite-field calculators for three and past ar- 
chitectures. Finally, in Section IVIIII we state our conclusions. 
In the appendices, we prove the basics of BMS algorithm, the 
property of generic errors, and the main theorem of proposed 
algorithm. 

II. Preliminaries 

In this paper, we consider one-point algebraic-geometric 
codes on non-singular plane curves over a finite field K := F^, 
in particular ®-type codes (not L-type). Let Zq be the set of 
non-negative integers, and let a, 6 S Zq be < a < 6 and 
gcd(a, h) = 1. We define a curve X by an equation 



D{x.y) := y°- + ex'' + 



E 



(3) 



(ni,n2)eZ(, 



over K with e 7^ 0. Then the polynomial quotient ring 
K[X] :— K[x,y\/ {D{x,y)) consists of all the algebraic 
functions having no poles except at the unique infinite point 
Poo- Let {Pj}i<j<n be a set of n -rational points except 
Poo- We denote the pole order of F e K[X] at Poo as o(F). 
For m G Zq, the A'-linear subspace 

L{mPoo) ■-= {F e K[X] I o{F) < m} U {0} 

has dimension m — g + 1, provided m > 2^ — 2 by Riemann- 
Roch theorem, which we assume for simplicity in this paper. 
Our code C(m) is defined as 



C(m) := <^ (c,) e 



n 

J2cjF{Pj) = 0, VF G L(mPo> 



As shown in Il20llll2l1 . the class of Cj^ curves is sufficiently 
wide and contains almost all well-known plane algebraic 
curves that have many X-rational points such as Hermitian 
codes. Although Miura in [21| defined a more general class 
rC'^''^ including the Klein's quartic curve, we consider mainly 
for simplicity. 

Throughout this paper, we denote t as the number of 
correctable errors. Given a received word {rj) — (cj) + (ej), 
where Cj 7^ <^ j G {ji, ■ ■ ■ ,jt} corresponding to a set of 
error-locations £ — {Pj }i<-y<t, we need to find a Grobner 
basis |2J of the error-locator ideal 

I{£) := {F G K[X] I F(P,g = for VP,., G £}- 
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Fig. 2. Pole orders on $(5, 15) defined by o(n) := 3ni + 2n2, and pole orders on <I>('^)(3, 15), #'^'{3, 15), 'I>(-^'(3, 15). The values in shaded boxes 
correspond to monomials of the form x'^^y"^ not contained in ^'(15P(0:0:i) ) "f Klein's quartic curve x^y + y'^ + x = over GF(2^) (cf later sections- 



Then we can obtain £ as the set C {Pj}i<j<7i of common 
zeros of all the polynomials in the Grobner basis. 
For AeZq and < z < a, let 

:= {n= (ni,7i2) e Zl\i < n2 < i + A} 

and ^{A) Moreover, for A' e Zq, let 

A') {n e I o{n) < A'} 

and^iA,A') := ^'^°^A, A'). Fig. |2] illustrates $(2a-l,^') 
and $W(a,A') for A' = 15 and (a, 5) = (3,2); although 
we defined as a < 6, it must be generalized into a > 6 in 
the case of well-known Klein's quartic curve, which is one of 
the important examples not contained in curves; we will 
also take up codes on this curve later in section [V] We note 
that o{n) 7^ o{n') if and only if n n' for ?i, n' G <i>'')(a), 
and this is false for $(2a — 1). Thus F e K[X] is uniquely 
expressed as 

F{x,y)= Fnx^'y"'- (4) 

We denote by z" and define o{n) :— o{z'^) = nia + 

n2b, where o(-) is defined on both Z§ and K[X]; we remember 
that o(F) = max{o(n)| F„ ^ 0}. 

From a given received word {rj), we calculate syndrome 
values {ui} for I e $(2a — 1,to) by ui = X]j'=i ''^•^'(-fj)' 
where we have ui — J2*y=i^j-,^''iPj-,) '^he definition of 
C{m). Our aim is to find I{£) and (ej) with {ui}. 

III. Inverse-free BMS algorithm 

We continue to prepare notations to describe the algorithm. 
The standard partial order < on Zq is defined as follows: for 
n = (ni,n2) and n' = (^'1,^2) G Zq, n < n' ^ ni < n\ 
and na < n'^. For / e ^{a,A'), let e $W(a>^') be 
o(Z(')) = o(/) if there exists such an Z*^'^ for I and i. Then l'^^^ 
is uniquely determined for each / and i if it exists. Note that 
= I from its definition. Table Uillustrates e $('^(3, 15) 
for (a, 6) = (3, 2), where indicates the nonexistence of l'^^^ 
from a gap-number in o($(*)(a)). 

Before the description of the algorithm, we introduce the 
important index z for < i < a for updating in the algorithm. 
For Q < i < a and N G Zo, we define a unique integer 
Q < I < a hy I = h^^N — i (mod a), where the integer 
< < a is defined by hh^^ = 1 (mod a). If there is 
^(0 = a(*\4'') e *''Ha) with N = o(/(*)), then i = 4*' - i 
since = b^^ N {mod a) . Note that I = i, and that Z^*' exists 
if and only if exists with l^"^^ = l^^K 



TABLE I 

VALUES 0F«(') = (/^'',4''') e *''H3, 15) WITHo(Z(')) = N 
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We define degree deg(F) e $(a) of e uniquely 
by o{deg{F)) = o{F), and let s :— deg(F). From now on, 
$(a, o(s)) is abbreviated to $(a, s). Defining, for / G '5(a), 

\ otherwise, 

where "otherwise" includes the vacant case of we call 
dFi discrepancy of F G K[X] at I. Let V{u, N) be the set of 
F G K[X] whose discrepancies are zero at all / G $(a,iV), 
and let F(u,-1) := K[X]. Then, for all G Zq U {-1}, 
V{u, N) is an ideal in the ring K[X] (as proved at Proposition 
[T]in Appendix lAli. The BMS algorithm computes a Grobner 
basis of V{u, N) for each N, namely, a minimal polynomial 
ideal-basis with respect to the pole order o( ). We may 
express the basis of V{u, N) for each N as a polynomials 
{^jv+i(-2)}o<i<a by dill. For sufficiently large B, we have 
V{u,B) — I{£) (proved at Proposition [3] in Appendix IbT i. 
Then {F^'^-^(z)} are called error-locator polynomials, and the 
set of their common zeros agrees with £. Since the Goppa 
designed distance dQ of C{rn) equals m — 2g + 2, we may set 

m := 2t + 2g — 1 for the correction up to t errors, (6) 

and can obtain V{u,m) by using {M;}iG$(a,m)- 

In the following inverse-free BMS algorithm, we denote the 
preserved condition (P) for updating formulae as follows: (P) 

4=^ 4^ = or 4' > Z« - 4\ 

Inverse-free BMS Algorithm 

Input syndrome values {ui} for / G $(2a — 1, m). 
Output error-locator polynomials {-F^+i(z)}- 

In each step, the indicated procedures are carried out 

for all < i < a. 
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Step (initializing) iV := 0, s 



„(0 



(i) 
N 



Step 1 (checking discrepancy) If l^"^^ exists and sj^' < 



then d 



moreover, e 



N 



Step 2 (A^-updating) 



, else d 

'^N,N- 



N ■- 



0; 





^ ?w - 


if(p), 

otherwise, 


(7) 




^ - 


if(P), 
otherwise, 


(8) 




\t) M) 

N JN 




(9) 


(^) 

9n+i ■■= < 


[ Z!'§ 


if (P), 
otherwise, 


(10) 




(-),,(») 


d^^wii modZ^, 


(11) 






if (P), 
otherwise. 


(12) 



Step 3 (checking termination) If < m, then := iV + 1 
and go to Step 1, else stop the algorithm. □ 



In the formula (fTTl i. "mod Z^" means that v'f^j^x 1^ defined 
by omitting the term of in v'j^ . Then ti^\ can be 



represented by 



and v'^^ jq, ''^pfN defined by these. We obtain {Fj^\z)} 
through 



E 

iS<E>(a,. 



with 



Then dj^"* in the algorithm agrees with the discrepancy of F^^ 
at oil) ^N, i.e., d%^ ^ d{Fj:^^)i. 

This inverse-free BMS algorithm is a novel version that 
eliminates the inverse calculation (d^) ^ from the parallel 
BMS algorithm lfT6lll27l . Compared with updating formulae 
in the original algorithm, which are later quoted at (fT6]l-(fT9]l, 
we see that (l9ll-(fT2ll have eliminated the use of divisions, 
and in consequence have used ej^'. It is possible that one 
could remove the inverse calculation from the original (not 
parallel) BMS algorithm ;/ the values of e)^ , which are 
actually previous values of d^^, are registered to memory- 
elements; in our parallel inverse-free BMS algorithm, we can 
conveniently take from the coefficients of wj^' (as done 
in Step 1). 

The following theorem confirms that {-F'^^}o<i<a is a 
Grobner basis of V{u, N — 1). 



Theorem 1: We have F^'^ e V{u,N-l), deg(F^'^) 



■ N 

'^NS — "NS — 



'AT > 



(i) 

— mm ■ 



> s^ ;^^\ and 



1) 




F G V{u,N - 
deg(F)= (Cgi,z) 

The proof of Theorem [T| is referred to Appendix |D1 in which 
1 ~ '^N 1 + 1 is also obtained for all N and i. 
As explained at Proposition [3] in Appendix iBl the integer B 
is required as i? > 2i + 4(7 — 2 + o to correct up to t errors. 
Moreover, it is well-known |[3]|||26l that the determination of 
unknown-syndrome values has to be done to proceed the loops 
for TV = m + 1, TO + 2, • • • , B of BMS algorithm. In our 
Theorem [T] as a result of division-less, "F^^^^ ~ 1" is not 
generally true differently from Theorem 1 of 1 16], and this fact 
disables us from generating the candidate values of unknown 
syndromes for majority voting. Therefore, in our inverse- 
free BMS algorithm, we avoid the determination of unknown 
syndrome, and the loops of the algorithm are proceeded only 
for < < m by using the known syndrome values obtained 
directly from the received word. Furthermore, we mainly 
consider the error-correction of generic errors IS) ||23| (defined 
in the next section). These techniques cause a slight decrease 
in the error-correcting capability; however, as described later 
in section IIV-BI it does not matter in practice. 

IV. Inverse-free architecture 

As the first of three kinds of architectures proposed in this 
paper, we describe inverse-free architecture, which has the 
plainest structure of the three. 

A. Model structure 

In this subsection, we give a direct application of the 
inverse-free BMS algorithm, which corresponds to Kotter's 
architecture |7| of which inverse-calculators have been re- 
placed by multipliers. To make the case clear, we describe the 
architecture for elliptic codes, that is, codes on elliptic curves, 
although we take the generality into account; we can employ 
it for other codes on algebraic curves without difficulty. 



As shown in the model Fig. [S] the coefficients of 



JN 



are arranged in a sequence of shift-registers, and those of w)^' , 
g'"^ are arranged in another sequence. It is similar to Kotter's 
architecture [TJ that the proposed architecture has a-multiple 
structure (i.e. a blocks) of the architecture for the Berlekamp- 
Massey algorithm |r||ll| of RS codes. The difference is that 
a division-calculators in the Kotter's architecture are replaced 
with a multipliers in our architecture. Moreover, while the 
values of discrepancy are computed in the Kotter's architecture 
with one multiplier and a shift-register according to definition 
Q, our architecture derives the values from the coefficients of 
with discrepancy registers and reduces the one multiplier 
for computing discrepancy. 

In Fig. [51 we omit input and output terminals, and the 
initial (A^ — 0) arrangement of the coefficients in polynomials 
is indicated. The number of registers in one shift-register 
sequence for v)^ and Jj^ should be equal to the total number 
of coefficients in wj^' and Z^', i.e., to + 2 for C(to); although 
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Fig. 3. Inverse-free architecture for elliptic codes, which is composed of a = 2 blocks exchanging Wj^ and g)y ■ 



1. 

2. 

3. 
4. 
5. 



10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21. 
22. 
23. 
24. 
25. 
26. 



2 1 



% initializing 

^0*10 
* * * * 1 
:- [10 -1 14 12 7 



v_f_0 
v_f_1 
w_g_0 
w_g_1 



3 2 4' 
2 Ij 

5 1 8 9 0]; % 10 registers 
[-1 -1 -1 12 -1 5 2 8 10 0]; % 10 registers 

■ [0 -1 ... -1]; % 11 registers 

■ [0 -1 ... -1]; % 11 registers 



-1; 



■■ S; 



d_0 :=-1; d_1 :=-1; t_0 :=-1; t_1 
N F-1; S F [0 0]; C := [-1 -1]; PS 
T := [-1 -1]; M := [-1 -1]; 
% start of main clock loop 
for do = to 1 1 *9-1 ; 
if mod(clo,11) = 0; N F N+1; end; 
print [v_f_0, v_f_1, do, w_g_0, w_g_1]; 
if mod(clo,11) = 0; 

bO;=mod(clo/1 1 ,2)+1 ; b1 ;=mod(1 +clo/1 1 ,2)+1 ; 
if S(1) <= 11(1, N+1); d_0 ;= v_f_0(1); 

else; d_0 ;= -1 ; 
end; t_0 ;= w_g_0(1); 
if S(2)"<= ll(2,N+1); d_1 ;= v_f_1(1); 

else; d_1 ;= -1 ; 
end; t_1 ;= w_g_1(1); 
end; 

v_f_0_t := v_f_0(1); v_f_1_t ;= v_f_1(1); 
w_g_0_t ;= w_g_0(1); w_g_1_t ;= w_g_1(1); 
v_f_0(1;9) ;= v_f_0(2;10); v_f_1(1:9) := v_f_1(2:10); 
w_g_0(1;10) ;= w_g_0(2;11)"; w_g_1(1 ;10)~ ;= w_g^1(2;11); 



27. if mod(do,11) = 0; v_f_0(10) ;= -1; v_f_1(10) ;= -1; 

28. else; % updating of v and f 

29. v_f_0(10) ;= t_0 (g) v_f_0_t d_0 (g) w_g_0_t; 

30. v_f_1(10) ;= t_1 (g) v_f_1_t © d_1 (g) w_g_1_t; 

31. end; 

32. if mod(clo,1 1) = 0; % updating of S and C 

33. if d_0 < or S(1 ) >= 11(1 ,N+1 ) - C(bO); 

34. ns ;= S(1); nc ;= C(bO); 

35. else; ns ;= 11(1 ,N+1 ) - C(bO); nc ;= 11(1 ,N+1 ) - S(1 ); 

36. T(bO) ;= S(1); M(bO) ;= N; 

37. end; S(1) ;= ns; C(bO) ;= nc; 

38. if d_1 < or S(2) >= ll(2,N+1 ) - C(b1 ); 

39. ns ;= S(2); nc ;= C(b1); 

40. else; ns ;= ll(2,N+1) - C(b1); nc ;= ll(2,N+1) - S(2); 

41. T(b1) ;= S(2); M(b1) := N; 

42. end; S(2) ;= ns; C(b1) ;= nc; 

43. end; 

44. % updating of w and g 

45. if 8-N <= mod(clo,1 1 ) <= 8-M(bO) and N<8; 

46. w_g_1(11) ;= -1; 

47. elseif S(bO) = PS(bO); w_g_1 (11);= w_g_0_t; 

48. else; w_g_1 (11);= v_f_0_t; end; 

49. if 8-N <= mod(clo,1 1 ) <= 8-M(b1 ) and N<8; 

50. w_g_0(11) ;= -1; 

51. elseif S(b1) = PS(b1); w_g_0(11) ;= w_g_1_t; 

52. else; w_g_0(1 1 ) ;= v_f_1_t; end; 

53. if mod(do,11) = 10; PS ;= S; end; 

54. end; % end of main clock loop 



Fig. 4. Program simulating the inverse-free architecture for (24, 16,8) elliptic code C(8) over GF(2^) with thi'ee-error con'ection. 



it might seem that there is no space for flj\ it is made by 
shortening and shifting of v\,' as N is increased. On the other 
hand, the number of shift-registers required for and gjy 
is one more than that for w^'' and Z^-* because of the structure 
of parallel BMS algorithm, and should be to + 3. 

If = mod (m + 3), the switches in the discrepancy 
registers are closed downward to obtain the values of dis- 
crepancy v'"^ ^ ~ dj^"*, and if ^ mod (m + 3), they 
are closed upward to output the values of discrepancy at 
each clock. The head-coefficient registers work similarly to 
the discrepancy registers, and output the values of the head 
coefficient wj^^ — e^^ of w"^ . The coefficients of w"^ and 
are transferred from the block of v'^^ to that of v'^^j^-^ (l for 
The switches A and B work according to the preserving 



or updating of w''^ and gj^-*, i.e., "(P)" or "otherwise" in (fTOl i 
and ( fnt . 

Thus, one may only perform simple additions and multipli- 
cations for the values in the shift-register sequences for v}q 
and /^^ to update them. On the other hand, as for w^^ and 
g'"^ , one must not only perform additions and multiplications 
but also set register-values to zero, or else old disused values 
corrupt Vj^ and fj^ . We describe this procedure in a later 
subsection IIV-CI 

This inverse-free architecture has an a-multiple structure 
closer to Kotter's than to the latter two architectures, and 
has been changed to division-free and parallel in the sense 

(i) (i) 

of using two types of polynomials, Vpf and Wpf , to compute 
discrepancy. We see in Section IVIII that the total number of 
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shift-registers in our architecture is nearly the same as that 
in Kotter's, i.e., the additional polynomials do not contribute 
essentially to the total number of registers. 

B. Decoding of generic errors 

To implement the inverse-free algorithm effectively, we 
concentrate on decoding generic i-errors lf5lll23l. for which 
the degree s^"* of error-locator polynomials is characterized 
by 0(5^') < t + g — I + a, while in general we have 
o{s''^ ) < t + 2g~l + a. In other word, the error-location £ is 
generic if and only if so-called delta set {I G <&(a) 1 1 < s^^''} 
of error-locator polynomials corresponds to the first t non-gaps 
in (<i>(s)). Then the loops of BMS algorithm are required for 
0<iV<TO + a — Ito obtain the error-locator polynomials 
for generic i-errors, while in general < iV < 771 + 2g — 1 + a 
for all errors; these facts are proved in Appendix [C] Thus we 
see that (t — \{a — l)/2]) errors are corrected in C{m) after 
A^-updating for < iV < ?7i. The merits of this method are 
not only that it is inverse-free and there is no majority logic 
lO but also that there are fewer loops of the BMS algorithm; 
we can cut it down to 25 — 1 loops. Furthermore, this method 
can also be applied to Kotter's and systolic-array architectures 

m. 

There are two drawbacks to this method. The first is that 
non-generic errors cannot be corrected. Since generic or non- 
generic is also defined by whether a matrix determinant ^ 
or not (as shown in Appendix |C|, the ratio of generic errors 
to all errors is estimated at (g — l)/q, under the hypothesis 
for the randomness of values {z\Pj)} (which is supported 
by numerical tests IT2I ). As for a practical size q = 2®, 
the ratio is equal to 255/256 = 0.9960 Moreover, for 
errors less than t, the percentage of correctable errors increases 



since o(s^'')s decrease. Thus we have less effect of this 
drawback. The second is that the number of correctable errors 
is decreased \{a — l)/2] for t-error correctable codes C{m). 
This corresponds to t — 1 errors for all elliptic codes, and 
t — 8 errors for Hermitian codes over F28. However, this 
has no serious effect on practical function; we might choose 
C{m + a — 1) to correct t errors, and the remaining error- 
correcting capability is available for error-detection up to 
t+l{a — 1)/2J errors. In the next subsection, we demonstrate 
the decoding of C(m) with m := m+1 (i.e. a = 2) for i-error 
correction in codes on elliptic curves. 

C. Simulation and numerical example 

In this subsection, we focus on an elliptic code, especially 
on the elliptic curve defined by the equation + y = + x 
over K := Fig, and simulate a decoder for it. This curve 
has 25 A'-rational points equal to the Hasse-Weil bound with 
genus one, and we obtain code C(m) of length 24. 

We choose a primitive element a of K satisfying a'* + a = 
1, and represent each non-zero element of K as the number of 
powers of a. Moreover, we represent zero in K as —1; note 
that, e.g., and —1 mean 1 = q;° and 0, respectively. Let the 
set of error-locations £ := {{x^y) = (3, 7), (9, 11), (14, 4)}, 
and let the error- values be 6, 8, 11, respectively. 

In Fig. m we provide a brief description of MATLAB m- 
file program for our architecture, where mod(x, Y) returns 
the smallest non-negative integer satisfying x = mod(a::, Y) 
(mod Y). Comments are written next to "%." At line 2, 11(1 + 
i, 1+A^), which corresponds to the l+A^)-th component 

of matrix II in MATLAB m-file notations, defines ^ with 
N = o(/(')) of G $(')(2,8) to decode 3 errors in C(8) 
with m = 8. In the case = * in II, the logical sentences at 
lines 1 6 and 1 9 are regarded to be false. 
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Fig. 5. Serial architecture for Klein-quartic codes, which has a single stiucture with serially-arranged coefficients. 



In the case of elliptic codes C(m + 
registers for v'^^ and J^' should be (to 



1), the number of 
+ 1) + 2 = 2t + 4 
by and that for wjy' and gjy'' should be 2< + 5, as in 
lines 3-6 for t = 3. At line 1 5, the value bO (resp. b1 ) 
corresponds to I at iV for i = (resp. i = 1). At lines 25 
and 26, the shift-register values are shifted to the neighbors, 
and, e.g., "v_f 0(1 :9):=v_f 0(2:1 0)" indicates the shifts of 
nine values v_f_0(1 ):=v_f_0(2), • • •, v_f_0(9):=v_f_0(1 0), where 
v_f_0(n) corresponds to the n-th component of v f O. 

Table shows that our architecture outputs the error- 
locator polynomials {fIj^j^i{z)} and the auxiliary polynomials 

{G'il+ii.^)} for ^- The top of Table HIl indicates the indexes of 
registers of four shift-register sequences. The center column 
indicates the values of "clo" in the program, which corre- 
sponds to the underlying clock of the architecture. The values 
of discrepancy are indicated at the left bottom of Table HIl 

' indicates the state that does not exist or 

are obtained at clo 



-1 



where 

*iv\ > ^^'^ The values of discrepancy cf^ 



UN from v_f_0(1) or v_f_1(1) if s^^^ < if. The values of 

are indicated at the right bottom of Table HIl 
The most difficult point in the program is that suitable 
register values must be settled to —1 at the lines 45 and 49 for 



not changing the coefficients of . Let t 



deg{Gf 



and Af^*-' be the value of N at which the last updating of 
G]y occurred; we have t)^ = sy(i) with z at M^''', and have 
= T(l + i), M^') = M(l + i) in the program. Then, we 



t 



(0 

N,l 



claim that g 



(0 

N,N-Ml.i) 



(i) 

9n 



, that is, the head coefficient of 



is located at the (10 - M^'-^)-th register of w g O or W_g_1 
according to t ~ or 1 if mod(clo,11) — 0. For example, 
if do = 66 and N = 6, we can see from s^j^ i in Table lU 



4. Then g^"l 



is in w_g_0(6). As another 



that Af(o) 

example, if clo — 77 and N = 7, we can see that A/^^^ — 6, 
and then gj^l = a'^ is in W_g_0(4). 

Noting that the value in w_g_0(j) at mod(clo,11) = is 
the shifted value at mod(clo,11) = j - 1, e.g., w_g_0(11) 
:= W_g_1(1), we obtain the upper and lower conditions of 
w_g_0(1 1 ) and w_g_1 (11):= -1 at fines 45 and 49, since each 



N + 1- A/(*) value of w_g_OG) and w_g_1 (j) for j = 9-N, 
9 - iV + 2, ■ • • , 9 - AfW must be -1 at mod(clo,1 1 ) = 
in each wf. The condition "N<8" is required to obtain the 

(i) (i) 

values of Cg :~ Wg g for error-evaluation (stated below). 



Thus, the Grobner basis {-F'g"'' 



a 



(1) 



1 s 

a xy 
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a"'y + a x + a } of ideal 
I{£) has been obtained together with the auxiliary polynomials 
{C'-g^ = a^°x + a^^, G^^' = a^y + a'^x}. We obtain the set 
£ of error-locations through the Chien search, and obtain each 
error-value by O' Sullivan's formula 




(p \ Mi) 



m+l,s 



°m+l 



for R- e £, (15) 



where F!^'j^-y{z) is the formal derivative of F^^j^-y{z) with 
respect to x, e.g., y' = + 1. Note that the divisions 
in this formula are independent from BMS algorithm, and 
are calculated by the repetitional multiplications using the 
multipliers in our architecture as follows. 

Since we have = /32"-2 for 7^ ^5 G F2", and have 
a„ — 2" — 1 for the sequence defined by oi := 1 and a„+i := 
2a„ + 1, we see that the calculation of (3~^ consists of (n — 
2) multiplications of (3 and [n — 1) squares, and the total is 
(2n — 3) multiplications in . Thus we can say that our 
architecture eliminates a inverse-calculators, each of which 
corresponds to (2n — 3) multipliers, with [^^^^J slight drop 
of error-correction capability for C(to + a — 1). 

V. Serial architecture 

As the second architecture, we describe serial architecture 
|fT3l|, which has a different structure from Kotter's and the 
preceding ones. In this section, we focus on well-known codes 
on Klein's quartic curve over K :— Fg, and simulate a decoder 
for it. Many articles so far have treated codes on this curve as 
examples. 

Klein's quartic curve is defined by equation X^Y + Y^Z -f 
Z^X = in projective plane = {{X : Y : Z)}, which 
causes y^x + x^ + y — hy {x,y) := {Y/Z, X/Z) in the 
affine form, and has the same number of X-rational points as 
Hasse-Weil-Serre upper bound 24 with genus 3. We denote 
if -rational points (X : y : Z) = (1 : : 0) and (0 : 1 : 
0) as P(i:0:o) ^nd -P(o:i:o)' Other 22 points as the values 
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1 . % initializing 

**1*1212323434 5" 

2. ||;=*****1*I2*232323; 

*******1**2*2343 

3. vf r := [5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 
6 -1 6 3 -1 -1 2 2 -1 3 -1 3 4 -1 -1 6 6 6 -1 4 3 
3 6 6 6 3 5 3 4 6 6 0]; % 50 registers 

4. w_g_r := [-1 ... -1]; % 54 registers 

5. d_r := [-1 -1 -1]; e_r := [0 -1 -1]; 

6. N := -1; S := [0 1 1]; C := [-1 0]; PS := S; 

7. T := [-1 -1 -1]; M := [-1 -1 -1]; 

8. % start of main clock loop 

9. for do = to 54*16-1; 

10. if mod(clo,54) = 0; N := N+1; print [do v_f_r]; end; 

11. vbi := 1 +mod(-clo-N,3); % i of v_f_r(1 ) 

12. wbi := 1+mod(clo,3); %iofw_g_r(1) 

1 3. d_r_t := d_r(1); d_r(1 :2) := d_r(2:3); 

14. e_r_t := e_r(1); e_r(1 :2) := e_r(2:3); 

15. % switching of discrepancy register 

16. if mod(clo,54) = 0, 1, or 2; 

17. if S(vbi) <= ll(vbi,N+1); d_r(3) := v_f_r(1); 

18. else; d_r(3) := -1 ; 

19. end; 

20. else; d_r(3) := d_r_t; 

21. end; 

22. v_f_t := v_f_r(1 ); v_f_r(1 :49) := v_f_r(2:50); 

23. w_g_t := w_g_r(1); w_g_r(1 :53) := w_g_r(2:54); 

Fig. 6. Program simulating the serial architecture for (23, 10, 11) code C 



24. if mod(clo,54) = 0, 1 , or 2; upd := -1 ; 

25. else; upd := v_f_t © d_r_t (g) w_g_t 

26. end; 

27. if mod(clo,3) = 0; % switching of exchange register 

28. e_r(3) := upd; v_f_r(50) := e_r_t; 

29. else; v_f_r(50) := upd; 

30. end; 

31 . % updating of S and C 

32. if mod(clo,54) = 0, 1, or 2; 

33. if d_r(3) < or S(vbi) >= ll(vbi,N+1)-C(wbi); 

34. ns := S(vbi); nc := C(wbi); 

35. else; ns := ll(vbi,N+1)-C(wbi); nc := ll(vbi,N+1)-S(vbi); 

36. T(wbi) := S(vbi); M(wbi) := N; 

37. end; S(vbi) := ns; C(wbi) := nc; 

38. end; 

39. % updating of w and g 

40. if 45-3*N <= mod(clo,54) <= 47-3*M(wbi); w_g_K54) := -1 ; 

41. elseif PS(wbi) = S(wbi); w_g_r(54) := w_g_t; 

42. else; w_g_r(54) := v_f_t © d_r(3); 

43. end; 

44. if mod(clo,54) = 53; PS := S; end; 

45. end; % end of main clock loop 



on Klein's quartic over GF(2'^) with four-error correction. 



of {x,y). Although it is not a curve, the monomial basis 
of L(mP(o:i:o)) to make C{rn) is obtained by {a;"^?/"^ \ n G 
$(3, m)}\{y, y^} with o{n) 3ni + 2n2 and the minimal 
pole order a = 3 as in Fig. |2] We note that a;(P(i.o:o)) = 
(a;y)(-P(i:0:0)) = and (a;?;^)(P(i:0:0)) = 1, and then obtain 
code C{m) of length 23. 

We intend to correct generic errors in C{m + 2) with m 
2t+5 (cf. lIV-Bb . Let a primitive element a of K he a^+a ~ 1. 
We represent each non-zero element of K as the number of 
powers of a as in IIV-CI Let the set of error-locations £ :— 
{{x,y) = (0,1), (1,0), (2,0), (3, 3)}, and let error-values be 
1, 2, 5, 4, respectively. 

As in the model Fig. |5] the serial architecture has a single 
structure similar to that of RS codes, while Kotter's and 
the preceding inverse-free architectures have an a-multiple 
structure. The initial (N = 0) arrangement of the coefficients 
in polynomials is also indicated in Fig. |5] In the case of 
the architecture for codes on Klein's quartic, it is convenient 
to exchange i and i in all updating formulae (l7li- (fT2l i. and 
the validity follows from i — i. For the serial architec- 
ture, we employ not the inverse-free BMS algorithm but the 
original parallel BMS algorithm fT6l[T71, which is described 
by exchanging updating formulae (l9l)- (fT2l i into the following 



■iN+l 


— Jn "Af .y Af ' 


(16) 


9 N+1 


_ [ Zg^j!^ if (P), 
\ (dW) i^/W otherwise. 


(17) 


(^) 


= z;(;)-4)^«« modZ", 


(18) 


(i) 


.^i _Zw^N^ _ if(P), 
1 (d^^) Zv-^ otherwise. 


(19) 



Then the coefficients of v 



and 



are arranged serially 



in the order i 



those of w 



0,2,1 in one sequence of shift-registers, and 
are arranged in the order i = 0, 1,2 



and 



in another This arrangement of coefficients is decided by the 
pair (7, i), and is special to the codes on Klein's quartic; for 
codes on curves, see the next subsection. 

Instead of the round of {wj^'' , g]^^ } (0 < i < a) among 
a blocks in the preceding architecture, the order i — 0, 2, 1 
of {f^'',/^''} at A'' = (mod a) is changed to 7 = 2, 1,0 at 
A^ = 1, and to 1, 0, 2 at A^ = 2, and so on. Although one may 

(i) (i) "1 

change the order of the coefficients of {w)^' , g)^ }, our layout 
is easier because of the existence of updating (i.e., the switch 
"U" in Fig. IS. 

The exchange register has this role of changing the order 
We introduce a method to carry it out with only shift- 
registers and switches. The following is a small example; at 
mod(clo, 3) = 0, the switch is down to take the leftmost value 
in the exchange register, and at other clo's, the switch is up 
in order to pass it. 



clo = 



clo= 1 



clo = 9 



clo = 12 



1 2 3 4 5 6 7 



2 3 4 5 6 7 



2 3 1 5 6 4 8 9 



5 6 4 



9 7 3 1 



We can see that the exchange register works like a shift- 
register, since the order-changing has been finished at clo — 9 
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TABLE III 
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and the omission by mod Z'^ in (fTTl i has been done after a 
more clo's. 

The number of registers in one shift-register sequence 
for ujj-'s and /]j-'s should be equal to the total number of 
coefficients minus one, i.e., 3(m + 2) — 1 for C{m), and this 
works like 3(to + 2) together with the exchange registers. On 
the other hand, w J& and gjy s require a more shift-registers 
than t;^''s and s because of the structure of parallel BMS 
algorithm. Thus the number of registers for w'^^s, and g^^s 
should be 3(m + 2) + 3. Then 6< + 26 and Qt -f 30 registers 
are required for C(m + 2) with m = 2t + 5. 

In Fig. |6] we describe the architecture with a MATLAB m- 
file program, where the notations are the same as in Fig. |4] At 
line 6, the values of [s^\, s^\, s^\] and [c%\, c%\, c^^\] 
are initialized differently from all and —1 because of the 
exclusion of {(0, 1), (0, 2)} from $(3). 

The most difficult point in the program is again that suitable 
register values should be settled to zero at line 40 in the 
successive loop for not meeting the coefficients of /^'. Since 
— /q"q is at the 49-th register in the initial values of 
V_f_r, we claim that (the head coefficient of g]^ ) is 

located at the (49-3M(*))-th register of w g r if mod(clo,54) 
= i. For example, if clo = 648 and N = 12, we can 
see from s^^\ in Table |III] that M'-°^ = M^^) = 11. Then 

9i2,i = 9r2.i = are in w_g_r(16) at clo = 648 and 649. 
Similarly as in Subsection IIV-CI we note that the value in 



w_gj(j) at mod(clo,54) = i is the shifted value at mod(clo,54) 
= i+ j ~ I, e.g., w_g_r(54) := v_f_r(1). Moreover, since each 
TV + 1 - Af value of w_g_r(j) for j = 46 - 3iV, 46 - 3iV + 3, 
• ■ • , 46 - 3M('' must be -1 at mod(clo,54) = i in each 



w 



N ' 



we obtain the upper and lower conditions of W_g_r(54) 
at fine 40 as the union of 



= 


^ j = 45 


-37V, • 


■ , 45 - 


3M("\ 


1 = 


» j = 46 


-3iV, • 


■ , 46- 


3M(^\ 


2 = 


» J = 47 


-37V, • 


■ , 47- 


3M(2). 



Thus we have obtained the error-locator polynomials 



(0) 



16 
(1) 



+ x'^ + a^xy + a^x + a, 



F^g ~ X y + ax + a xy + a x + a , 
Fif = xy'^ + a^x^ + xy + ol'x + a^, 

whose common zeros in the rational points decide E, and the 
auxiliary polynomials 



G 



(1) _ 

16 ~ 



0, 



G 



(2) 
16 



OL^X^ + O^X + OL^ . 



Then we obtain each error-value by O' Sullivan's formula ||24| 

= f E ^i:+i(^.)G'iti(P,)) for P, e £, 

\0<j<a / 
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Fig. 7. Serial inverse-free architecture for Hermitian codes, which is the closest to the RS-code error-locator ones. 



I initializing 

'q * * * 



1 
1 

* 



3 2 10 

* 2 1 

* * 1 

* * * Q 



4 3 2 

* 3 2 

* * 2 



1 5 4 3 2 6 

1 4 3 2 1 
10*321 
10**21 



v_f_r 
1 1 -1 -1 
-1 -1 -1 



[1 



1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 
1 -1 -1 -1 -1 -1 -1 -1 12 -1 -1-16 6 -1 -1 13 13 13 -1 
-1 9 -1 -1 -1 2 2 -1 -1 9 9 9 -1 5 5 5 5 14 -1 -1 -1 9 9 -1 
-1 4 4 4 -1 2 2 2 2 12 13 13 13 -1 -1 2 2 2 -1 14 14 14 14 -1 6 
66 00]; % 103 registers 
w_g_r := [0 -1 ... -1]; 
d_r := [-1 -1 -1 



■1 -1 -1]; 



4. w_g_r := [0 -1 ... -1]; % 108 registers 

5. d_r := [-1 -1 -1 -1]; h_r := [-1 -1 -1 -1]; e_r := [0 

6. sv_r := [-1 -1 -1 -1]; sw_r := [-1 -1 -1 -1]; 

7. N := -1; S := [0 0]; C := [-1 -1 -1 -1]; PS := S; 

8. T := [-1 -1 -1 -1]; M := [-1 -1 -1 -1]; 

9. for do = : 112*25-1; % start of main clock loop 

10. if mod(clo,1 12) = 0; N := N+1; print[clo v_f_r]; end; 

11. vbi := 1+mod(clo+N,4); % i of v_f_r(1 ) 

12. wbi := 1+mod(-clo,4); %iofw_g_r(1) 

13. sv_r_t := sv_r(1); sv_r(1:3) := sv_r(2:4); sv_r(4) :- v_f_r(1); 

14. sw_r_t := sw_r(1); sw_r(1 :3) := sw_r(2:4); sw_r(4) :- w _g_r(1); 

1 5. d_r_t := d_r(1 ); d_r(1 :3) := d_r(2:4); h_r_t := h_r(1); 

16. h_r(1:3) := h_r(2:4); e_r_t := e_r(1); e_r(1 :3) := e_r(2:4); 

17. if mod(clo,112) = 0, 1, 2, or 3; 

1 8. if S(vbi) <= ll(vbi,N+1 ); d_r(4) := v_f_r(1 ); 

19. else; d_r(4) :=-1; 

20. end; h_r(4) := w_g_r(1 ); 

21 . else; d_r(4) := d_r_t; h_r(4) := h_r_t; 

22. end; 

23. v_f_r(1:102) := v_f_r(2:103); w_g_r(1:107) := w_g_r(2: 1 08); 



24. if mod(clo,1 1 2) = 0, 1 , 2, or 3; upd := -1 ; 

25. else; upd h_r_t (g) sv_r_t d_r_t ^ sw_r_t; 

26. end; 

27. if mod(clo,4) = 0; e_r(4) := upd; v_f_r(103) := e_r_t; 

28. else; v_f_r(103) := upd; end; 

29. % updating of S and C 

30. if mod(clo,112) = 0, 1,2, or3; 

31 . if d_r(4) < or S(vbi) >= ll(vbi,N+1)-C(wbi); 

32. ns := S(vbi); nc := C(wbi); 

33. else; 

34. ns := ll(vbi,N+1)-C(wbi); 

35. nc := ll(vbi,N+1)-S(vbi); 

36. T(wbi) := S(vbi); M(wbi) := N; 

37. end; 

38. S(vbi) := ns; C(wbi) := nc; 

39. end; 

40. % updating of w and g 

41. if 100-4*N <= mod(clo,112) <= 103-4*M(wbi) and 
N<24; 

42. w_g_r(108) :=-1; 

43. elseif PS(wbi) = S(wbi); w_g_r(108) :- sw_r_t; 

44. else; w_g_r(108) := sv_r_t; end; 

45. if mod(clo,1 1 2) = 1 1 1 ; PS := S; end; 

46. end; % end of main clock loop 



Fig. 8. Program simulating the serial inverse-free architecture for (64, 45, 14) Hermitian code over GF(2*) with five-error correction. 



where Pj^l'^^lz) is the formal derivative of F^^ll^^^{z) with 
respect to x, e.g., y' = [x^ + y^){xy'^ + 1)^^. The divisions 
in ( fTSl l are not required in this architecture since F^^i ^ and 

^m+i have been normalized as a". 

The definite difference from the preceding one is that the 
serial architecture has a compact structure analogous to the 
RS-code case, with one inverse-calculator for the parallel BMS 
algorithm (not inverse-free). In the next section, we will try 
to remove it from the serial architecture. 

VI. Serial inverse-free architecture 

We describe serial inverse-free architecture |17|, which 
has the smallest circuit-scale we have ever obtained and is 



the last among the three kinds of proposed architectures. In 
this section, we focus on Hermitian codes, that is, codes on 
Hermitian curves. These codes over F256 have the outstanding 
properties, and are ones of the most promising candidates for 
practical use. For simplicity, here we simulate the architecture 
for a Hermitian code over K :— Fig. The Hermitian curve 
defined by equation y"^ + y = x^ is one of C4 curves, and has 
65 iiT -rational points equal to the Hasse-Weil upper bound 
with genus 6. Then codes on this curve can have code-length 
64. 

As in the preceding two sections, we intend to correct 
generic errors in C{m + 3) with m := 2t + 11. The no- 
tations concerning K are the same as in subsection IIV-CI 
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TABLE IV 

Values of registers in two shift-register sequences, discrepancy d^-*, and s^\in the serial inverse-free architecture. 
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do 


v_f_r for vj^' and f^' 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


1 1 


12 


13 


14 


15 


16 


1 7 


1 8 


1 9 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


32 


33 


34 


35 


36 




103 





1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 





-1 


-1 


-1 


1 


1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


12 


-1 


-1 


-1 





















































































1904 


1 


6 


6 


1 


5 


6 


-1 


9 


11 


-1 


-1 


11 


3 


9 


5 





11 


1 


1 


8 


12 


3 


13 


5 


5 


-1 


1 


2 


12 


11 


10 


12 


11 


8 


8 


11 




-1 
















































































2016 


6 


-1 


13 


12 


-1 


-1 


8 


8 


9 


5 


11 


9 


1 


1 


10 


10 


3 


13 





-1 


-1 


1 


1 


14 


11 


10 


12 


4 


8 


8 


8 


2 


-1 


-1 


-1 


10 




-1 
















































































out 


5 


12 


12 


11 


9 


5 


-1 


-1 


-1 





-1 


-1 


-1 


-1 


4 


10 


9 


-1 


-1 


8 


6 


2 


-1 


-1 


-1 


7 


8 


-1 


-1 


-1 


4 


2 


14 


-1 


-1 


1 
























































































37 


38 


39 


40 


41 


42 


43 


44 


45 


46 


47 


48 


49 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


61 


62 


63 






















out 




9 


-1 


-1 


-1 


-1 


7 


13 


-1 


-1 


-1 


12 


-1 


-1 


-1 


-1 


2 


14 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


1 






















do 


w_g_r for w^-* and 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


32 


33 


34 


35 


36 




108 

















-1 


-1 


-1 


-1 


-1 




-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 




-1 


-1 


-1 


-1 


-1 


-1 




-1 
















































































1904 


6 








12 


14 


-1 


-1 


14 


14 




-1 


-1 


-1 


-1 


-1 


14 


3 


-1 


-1 


14 


2 


-1 


-1 


7 


9 


-1 


-1 


6 


6 




-1 


7 


-1 


-1 


-1 


-1 




-1 
















































































2016 


1 








1 


5 


-1 


-1 


9 


11 




-1 


11 


3 


-1 


-1 





11 


-1 


-1 


8 


12 


-1 


-1 


5 


5 


-1 


-1 


2 


-1 




-1 


-1 


11 


-1 


-1 


11 




-1 
















































































out 


1 








1 


-1 


-1 


-1 


-1 


-1 




-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 




-1 


-1 


11 


-1 


-1 


11 
























































































37 


38 


39 


40 


41 


42 


43 


44 


45 


46 


47 


48 


49 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


61 


62 


63 


64 


65 


66 


67 


68 


69 










out 




4 


-1 


-1 


-1 


-1 


-1 


-1 


-1 




-1 


-1 


13 





-1 


-1 


14 


6 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 




-1 


-1 


7 


2 











i N 





4 


5 


8 


9 


10 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 




i N 





1-8 


9-10 


11-17 


18~out 





1 





-1 


6 


-1 


-1 


12 


14 


-1 


-1 


-1 


1 


-1 


-1 


11 


6 


-1 


-1 


-1 








1 


2 


2 


3 


1 


-1 


-1 


2 


-1 





12 


-1 


14 


7 


-1 


-1 


1 


12 


-1 


-1 





9 


-1 


-1 


1 











1 


2 


2 


-1 


-1 


-1 


-1 


-1 





-1 


-1 


3 


-1 


-1 


-1 


6 


11 


-1 


-1 


-1 


-1 


-1 




2 

















3 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


10 


-1 


-1 


-1 


-1 


6 


-1 


-1 


2 


10 




3 


















We demonstrate 5-error correction, and set the error-locations 
£ := = (-1,0), (5, 3), (9, 8), (10, 13), (12, 2)}, and 

let error values be 11, 13, 2, 12, 9, respectively. 

As shown in the model Fig. |7l the serial inverse-free 
architecture also has the same single structure as that of RS 
codes. Initially, the coefficients of wj^'^s and f'^^s are arranged 
serially in the order i = 0,1,2,3 in a sequence of shift- 
registers, and those of w}^ s and g}^ s are arranged in the order 
I = 0,3,2,1 in another. This arrangement of coefficients is 
decided by the pair {i, i) with i+t = 0(mod4), and in general 
for other codes on curves, one can also arrange them in a 
similar manner with i + 1 = (mod a). Then the exchange 
register changes the order i = 0,1,2,3 of {wj^"*, /^''js at 
iV = (mod 4) into i = 1, 2, 3, at iV = 1, • • • , i = 3, 0, 1, 2 
at = 3. In general, for other codes on curves, it changes 
the order of i so as to keep i + % = b^^N (mod a) as the 
definition of 7. 

In the case of the serial inverse-free architecture, we require 
two other sequences of a shift-registers, supplementary regis- 
ters, as in Fig. [T] These do not appear in the algorithm but are 
due to technical reasons in the architecture. For example, we 



(0) _ 



2 and s 



(1) _ 

17,1 



1 are 



can see in Table II V I that the values s 
increased to 3 and 2 at the same N = 18. For such cases, the 
supplementary registers hold the values of the head coefficients 



(0 

N,N 
(0 



cannot be updated 



to V 



N.N- 



For the same reason as the previous ones, suitable register 



^25 



7^(1) 



2 


2 




14 


14 


1 


10 


9 


9 


8 




6 


5 


11 




9 




g(0) 

'-'25 


'-'25 


2 







7 


13 


6 


11 




14 


-1 


4 




11 





-^25 



-^25 



7 


2 


12 


1 


13 


-1 12 


7 


5 




12 


8 











4 
4 







'25 



'25 



1 -1 



-1 



1 -1 



-1 



Fig. 9. Output of the serial inverse-free architecture, where polynomials are 
depicted on <I>(4, 9). 



values should be set to zero at line 41, where the condition is 
derived by taking the supplementary registers into account as 
follows: Since q;° — /^q is at the 101-th register in the initial 
values of v f r as seen in line 3, we claim that the head coef- 
ficient g^j^ J^^Jyl(^) is located at the (101 - 4Af('))-th register 
of w gj iif mod(clo,1 12) = i. For example, if = 18, we 

M(i) = 17. Then, 



can see from in Table |IV] that M(") 
.(0) 



in W_g_r(33), g{^[-^ = a" is at do = 2016, and g\J-^ = a" 
is at do = 2019.' 

Similarly as in section IVl we note that the value in W_g_r(j) 
at mod(clo,1 12) = i is the shifted value at mod(clo,1 1 2) 
= i+ j — 1 + 4, where "+4" is caused by the supplementary 
four shift-registers. Moreover, since each + 1 — A/'*) value 
of w_g_r(j) for j = 97 - 4Ar, 97 - 4iV + 4, • • ■ , 97 - 4Af(') 
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must be —1 at mod(clo,1 12) = i in each w^"*, we obtain the 
upper and lower conditions of W_g_r(108) := —1 at Hne 41 
as the union of 

i = =^ j = 100 - 47V, • • ■ , 100 - 4M^°\ 

i = 3 ^ j = 103 - 47V, • • • , 103 - 4Af(^\ 

Thus, the Grobner basis of ideal I{£) and the auxiliary 
polynomials have been obtained as in Fig. |9] e.g., 

^2^5' = Q;"a;^ + a^°xy + o?x^ + a^y + ax + a^, 

and obtain each error-value by O' Sullivan's formula JTSl) . 

In this manner, we have constructed the smallest-scale 
architecture, which uses the supplementary registers differently 
from the others. In our example, the total number of shift- 
registers for polynomials is 215, while for the supplementary 
registers, it is 8, i.e., 3.7%. Furthermore, this percentage is 
decreased for larger t, and approximately 1/m, as seen in 
the next section; we have, e.g., m = 2t + 239 for the other 
Hermitian codes over F256. Hence we can say that 2a shift- 
registers for the supplementary registers are reasonably small 
in the whole architecture. 

VII. Performance estimation 

In this section, we estimate the numbers of multipliers, 
calculators for inverse, and registers, and the total running 
time. Although the estimation at Section IX in I1I6J was done 
with respect to the upper bound A = t+2g — 1+aof o(sj^'')s, it 
is now convenient to estimate with respect to m = 2t -|- 2^ — 1 
of the code C{m) since we consider architectures without the 
determination of unknown-syndrome values. 

We quote the result of the systolic array in 1 16 |; the numbers 
of multipliers and calculators for inverse are 2am and am/ 2, 
respectively, as seen at the upper part of Fig. 4 in [p.3866 jT6l . 
The number of registers and the total running time are (4m + 
9)a/2 and m + 1, respectively. 

The Kotter's architecture Q has 3a multipliers, a calcula- 
tors for inverse, and a(4A+5) registers, where A = (m+l)/2— 
1 + a since we restrict correctable errors to the generic errors. 
The total running time takes 2(A+1)(to+1) = (m+3)(TO+l). 

The serial architecture and the serial inverse-free architec- 
ture have two multipliers, and the inverse-free architecture has 
a times two multipUers. There is one calculator for inverse 
only in the serial architecture. The number of registers for 
these three architectures is equal to 2a times m + 2, which 
consists of the number of syndromes including the gaps plus 
one for the initial value of /^^; we ignore the contribution 
of the discrepancy, exchange, and supplementary registers 
since these are at most a few multiples of a and disappear 
in the order of m. The total running time for the inverse- 
free architecture agrees with m + 1 times the number of 
registers in the sequence for wj^"* and 5^', which is equal 
to (m + l)(m + 2). Those for the other two agree with 
a(m + l)(m + 2). 

We summarize these results in Table |V] where we denote 
only the terms of the highest orders for m in the estimations. In 



TABLE V 

Performance of various architectures. 



Architecture 


Number 
of (g) 


Number 
of (jnv!) 


Number of 
registers 


Running 
time 


Systolic array 


2am 


am/ 
/2 


2am 


m 


Kotter's 


3a 


a 


m^ 


Parallel-BMS 


2a 


a 


2 

m 


Inverse- free 


2a 





m^ 


Serial 


2 


1 


an?' 


Serial 
inverse-free 


2 





an? 



addition, there is an architecture between Kotter's and Inverse- 
free that employs the parallel BMS algorithm (not inverse- 
free); we call this temporarily parallel-BMS architecture and 
add it to the table. For example, in the case of Hermitian 
codes over 2^-element finite field, a and m is equal to 16 
and 2t + 239, respectively. Since the numbers of registers in 
all architectures have an unchanged order 2am in Table |V] 
we can see that these architectures have optimized their space 
complexity. 

Then we can see in Table [V] that a multipliers have 
been reduced from Kotter's to Parallel-BMS, and that a 
inverse-calculators have been reduced from Parallel-BMS to 
Inverse-free. Both contribute to the reduction of computational 
complexity. It is noticed that the latter reduction has been 
accompanied in C(77i + a — 1) by the slight decrease [^^^ J of 
correctable errors that is assignable to error-detection. On the 
other hand, two types of serial architectures have the constant 
numbers of finite-field calculators, and their running time 
takes a times longer than that of non-serial types. Thus our 
serializing method has provided a preferred trade-off between 
calculators and delay. 

VIII. Conclusions 

In this paper, we have proposed the inverse-free paral- 
lel BMS algorithm for error-location in decoding algebraic- 
geometric codes. Thus we have improved decoding bound 
t < [(do — 9 — 1)/2J in |6| based on linear system without 
the determination of unknown syndromes for AG codes, to 
t < [{dpG — a)/2\ for generic errors, where, e.g., g — 120 
and a = 16 for Hermitian codes over F28. Moreover, we have 
constructed three kinds of error-locator architectures using our 
algorithm. These architectures were not implemented until the 
determination procedure of unknown syndromes was removed 
from the error-location algorithm. Our novel algorithm and 
architectures have a wide range of applications to Grobner- 
basis schemes in various algebraic-coding situations, such 
as Sudan algorithm ll29l . Gurus wami-Sudan algorithm ||4l, 
Koetter-Vardy algorithm JS), and encoding of algebraic codes 
|19|. 

We have aimed to construct our architectures with only 
shift-registers, switches, and finite-field calculators. The com- 
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position of shift-registers is superior to that of RAMs (random- 
access memories) in decoding speed, and moreover, our ap- 
proach is useful for reveaHng their regularity. 

We can conclude that the error-locator architectures correct- 
ing generic errors have been completed by the whole from sys- 
tolic array (max. parallelism) to serial inverse-free ones (min. 
parallelism). These architectures enable us to fit the decoder 
of the codes to various sizes and speeds in many applications. 
It may also be concluded that our methodology, which is the 
direct decoding from only the received syndromes, correctly 
generalizes the RS-code case. 

Appendix A 
Proof that V{u, A) is an ideal 

We first note that, by (|5]l and the following lemma, 

/ e V{u, A) ^dfi=0 for I e $(a, A) 

^ fnUn+h = for he $(a, A - o(s)). (20) 

nG<l>(a,s) 

Lemma 1: We have {Z'^^) - s\l e $(a,A), l^"^'' > s} = 
^{a,A-o{s)). □ 
Proof. Obviously {l^''^'> - s\l e $(a, A), l^''^'> > s} equals 

{l-s\le $("^)(a,^), l>s} = <S>{a,A-o{s)), 

where the last equality follows from correspondence I — s =: 
h € ^{a,A- o{s)). □ 
For simplicity, we denote Pj and Cj as P^^ e £ and the 
error-value e^^ without loss of generality. Then we convert 
the sum ^ fnUn+h in ( l20] l as 

ne<I>(a,5) j = l j = l neil>(a,s) 

t 

= Y^,z\P,)f{Pj). (21) 

Proposition 1: For all A e Zo, the set V{u, A) C K[X] is 
a polynomial ideal. □ 

Proof. Suppose that / and g G V{u, A) with s :— deg(/) 
and t dcg(g). Then we show that f+g and z'\f G V{u, A). 
Note that, by dlB . 



d{f + g)i =Y^,{f + g){P,) 



;( = 2+t2)_s_4 



and the last two sums are zero from the assumption and 

{^(^2+t2) _s_t} = $(a, A - o{s) - o{t)) C $(a, A - o(s)), 
^{a,A — o{t)) by Lemma [T] For z^f, note that 

rf(-'7). = E^.(-'/)(^.>''"'^'"^"'(^.) 

= Ee,/(P,>'"^"'^'-^(P,), 

and {/('*2+ft2) _ ^ ^ _ 0(5) _ + /i by Lemma 
[1] Although <I>(a, ^ - o(.s) - o{Kj) + h ^ $(a, A - o(.s)) in 



general, the monomial 2;''°^^''^'"'' is represented as the linear 
combination of elements in {z' | / G $(a, A — o(s))}. Then we 
obtain d{z^ f)i = from the assumption, which completes the 
proof. □ 

Appendix B 
Proof of V{u, B) = I{£) 

This follows from the next Corollary and Lemma |2] 
Proposition 2: Let / G K[X] be satisfying 

E fhUh+i, = for I J G $(«) with j 1, • • ■ ,t 



□ 
□ 



and Aet{[z^^{Pj,)\) ^ 0. Then / G I{£) holds. 
Proo/ Since E/ie*(a,s) A^/i+i is converted as 

Using Riemann-Roch Theorem, we see that the map 

L{{t + 2g-l)P^)^¥\ {f^[f{P,),...J(Pt)]) 

is surjective. Hence there are hnearly independent t vectors of 
the form [z'iPi), ■■■ , z\Pt)] for I G <I>(a, t + 2g - 1), and 
we obtain the following sufficient condition for all errors. 
Corollary : Let / G K[X] be satisfying E] fhUh+i — 

/ie*(a.s) 

for all I G $(a, t + 2g - 1). Then / G I{£) holds. □ 
Lemma 2: We can choose a Grobner basis {/*•*-' }o<i<a of 
as o(/W) < t + 2.9 - 1 + a for alH. " □ 

Proof. First, we notice that an element of Grobner basis 

may be determined uniquely by 



min |o( f) o( f) 



; mod a} . 



(22) 



Let Ui be one of {t + 2g, t + 2g + 1, ■ ■ ■ ,t + 2g - 1 + a} 
satisfying rii = imoda. We temporarily denote as £{D) := 
dim L{D), where L{D) := {/ G ii:[<Y] | divisor(/) + 
D is positive} U {0} for a divisor D. Since we have 

£{{t + 2g - l)Poo -E)=g, 
e{{t + 2g)P^-E) =g + l, 



£{{t + 2.9 - 1 + a)Poo -E) =g + a, 

where E :— J2]=i Pj^ there is / G I{£) satisfying o(/) = rii. 
Then o(/(')) < Hi is obtained by (|22b . and max{o(/'-*-') | < 
i < a} < max{ni |0<i<a} = f + 25— 1 + a leads Lemma 

m □ 

Propositions.- B >2t + Ag~2 + a =^ = 
Proo/ If / G K[X] and s deg(/) < Z'^^), then d/j is 
converted similarly as (1211 1 to 



dfi :=E^'/(^» 



Hence, if /(Pi) = ■•• = /(Pt) = 0, then we have 
dfi — 0, and thus I{£) C V{u, B) is obvious. To prove 
D, let {/]j+i}o<i<a be a Grobner basis of V{u,B), where 
"i? -|- 1" is for consistency in the previous notation. Since 



I{£) C V{u,B), we can choose it as o(/]j*^j^) < t + 2g — 
1 + a from Lemma |2] and its proof. Now we suppose that 
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d{f_ 



(i) ) 
B + l) 



/_B+i h^h+iM -s'^''^ ~ ^ ^ ^ ^i'^i B) for n e $(a, t), and moreover, 



with l'^^^ > Sg^j^. Then we have, by Lemma[T] {l^^^ — s = 
B - o{s''g\^)) C $(0, t + 2g - 1). Thus we see that the 
inverse inclusion follows from Corollary of Proposition |2] □ 

Appendix C 
Generic case 

Let rrit := min{m £ Zo | dimiy(TOPoc) — t}; recall that 
dimi(mPoo) is equal to the number of / G $(a, to). If t > g, 
then we have nit = t + g — 1 since dim L{{t + g — l)Poo) — t 
and dimL((i + .g — 2)Poc) = i — 1- However, for t < g, 
we have for example TOg = 10 < t + g — 1 for Hermitian 
curve y** + J/ = a;^ over F24 . We define that t-error position 
£ is generic if det ([^''(Pj/)]) 7^ for P,/ G £ and 

G $(a,TOt). If f is generic, we obtain a Grobner basis 

{/« = - E;,e*(a,™.) ^4 Of ^(^) by solving 



z'i(Pi) • 






hi 




' Z^"\P,) - 


z'i(Pt) • 






f(i) 
. Jit . 




. z^''\Pt) . 



with s'*^ G $(a, TOt+i+i)\$(a, TOf+i). Then Lemma |2] is 
improved to o[f^^'>) <t + g—l + a for generic £. 

Conversely, if det ( [z'^ (P,/ )] ) = 0, then the equation 
from the linear dependency gives / G I{£) with deg(/) G 
3>(a, TOt). Thus we see that £ is generic if and only if the 
delta set {/ G $(a) | / < s^'^^} (footprint in [121) agrees with 
$(a,mt). Namely, our definition of generic is equivalent to 
the definition of generic in |23 1 and that of "independent" in 

aa. 

Proposition 4: Suppose that £ is generic. 
If / G V{u, nit +o(/)), then we have / G !{£)■ In particular, 
V{u, TO + a - 1) = I{£) with to 2t + 2,g - 1. □ 

Proof. Since {/("^^ ~ s\l e $(0, mt + o(/)), l^'"^ > s} 
agrees with $(a, mt) by Lemma[T] it follows from Proposition 
HI □ 

Appendix D 
Proof of Theorem[T] 

Theorem [T] is proved by the following three lemmas. 

Lemma 3: Suppose that G{z) G V{u,M - 1), dGk ^ 0, 
and t < fc with t = deg(G), fc G $(*=^Ha,M), and o(fc) = M. 
Moreover, suppose that F{z) G V{u, M) and dFs ^ with 
s = deg(P). Then, at least one condition of si > fci — ti + 1 
and S2 ^ k2 — ^2 holds. □ 

Proof. We suppose that si < ki — ti and S2 ~ fc2 — ^2- 
Since G G V{u, M - 1) and F G V{u, M), we have 

- Yl GrtUn+i-t = Gtui for / G $(*^)(a,M - 1), t < 

ne*(a,t)\{i} 

- ^ P.w,+i_, = F,ui for ? G $("^)(a,M), s < I. 

Since n2 + fc2 — 12 < a — 1 + S2 and n + fc — t>n + s>s for 
n G $(a,<), we have n+k-t G ^('^^'(a, M) and s < n+k-t 



GnUn+k-t 

n^'S>(a,t)\{t} 




rUj--^(^n+k—t)—s 



+ (r+k-s)-t 



rG'I>(a,s)\{s} nG$(a,t)\{t} 



G 



Pc ^ 



re$(a,s)\{s} 



where the last equaUty follows from r+k—s G $'^*^^(a, M— 1) 
and t < r + k — s for r G <I>(a, s)\{s} since ^2 + fc2 — S2 < 
— 1 + ^2 and r + k — s > r+t > t for r G $(a, s), and the last 
sum agrees with GtUk since S2 < ^2 = ^2 + 12 < S2 + a — 1 
and fc G $(^2' (a, M). This contradicts dGfc 7^ 0. □ 
Lemma 4: We have Sj^\ — Cj^\ + 1. □ 
Proof. We prove it by induction. The case of TV = follows 



from the initializing. Assuming s 



(0 

N,l 



Si) 
-N 



1 + I for all i, we 



prove s 



N+IS 



i i + l. We may assume that there is l^^^^ = 



I'-^l It follows that s^^\ > 4' 



Thus we may assume that s'^^\ < l\''—c)^\, s)^\ < t[' —c), 
and d!"^ ^ without loss of generality. If — 0, then 
it contradicts Lemma [3] since P^'^ G V{u,N - 1), p;^'' G 
V{u,N), ^,,<l^ 



-N 



;(^) 



s^\, and I = l^^ — i. Thus, we obtain 



,(■0 



^ and s)v+i,i 



'Af,l 



□ 



Lemma 5: Let P(z) G V{u, N-1), s < I with s = dcg(P) 
for I G $("^)(a, P), and let G(z) G Af - 1), t < fc with 
t = deg(G) for k G ^('^Ha,^). Suppose that dGk + 0, 
M = o(fc) < N — o{l) and k2 — t2 = h — S2- Then we have 



P(z) dGhz'-'F - dP,z 



*G gV^(w,A^), 



and deg(P) = r, where r := s if dP; = 0, and r := 
(max{si, li — ki + ti}, S2) otherwise. □ 
Proof. Since r2 = S2 and 

o (z'^-'F) ~ (z'^-'+'=-*G) 
= ria + S26 - (ri - h + ki)a — t2b (23) 
= o(0 - o(fc) > 0, 

we obtain deg{H) — r. Next, since P G — 1) and 

G G M - 1), we have 



+p—s 



Y^G, 

ne^{a.t) 



'^n-\-p—t 



p ^'^'^''^\a,N -I), s <p 

dFi p^l, 

p G ^ - 1). -P 

dGfc p = fc. 



We may assume that dFi ^ 0. If p e $("^)(a, A^ - 1) and 
r <p, then we havep-/+fc G ^'•*^ '> {a, A/-1) and t < p-l+k 
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from I — k + t < r, and moreover, 

nm{a,r) 

dGk^^ FnUn+(r-s)+p-r ~ ^Fi G„U„_|_(r_;+fe_()+p 

ne$(a,t) 

riG*(a,t) 

p G $("2)(a,7V - 1), r < p 

dGfe • dFi -dFi-dGk^O p = L 



nG$(a,s) 



□ 



Proo/ o/ Theorem [7] If dj^' 7^ and G 



1 1 := /f^ + 1 and F]^'^^ :^ Thus d^^+i = 



iS-'+lc^W 



'N 



0, then 



j(0 



and deg(Fj^^^) = s^^\i hold. Supposing that G^^' ^ 0, let 
M < TV be satisfying G^' 



(^) 



and I = ^2 ~ h ^hen we 



(4))"'Fi^;,o(fcO)) = M, 



have c^'' = k^^^ 
theorem except for (fT3T l and (fT4l i follows from Lemma |5] We 
prove (fT4l) by induction. The case of iV = in (Uli holds by 



the definition. Supposing that the equality is true for s}^\, we 
prove it for s^"*^]^ i- Let <;^^-^ be the minimum of C^-*-,^ in ( fT4l l. 



If (P), then 41i = < ^^Vi,i 
4Vi,i = 4Vi,i holds. If 4^ ^ and 4^ > l^') 



(i) 



sj^\, thus 

-AT . 



then we have d^' ^ as in the proof of Lemma |4] and 

' + 1, which is actually the equa- 



c ■ < s^'^ - / 

^Af+1,1 — '^Af+1,1 ~ 



tion I — s^^'^^ ^ by Lemma [3] for F^^ £ V{u,N - 1) 



and F e V{u,N) satisfying deg(F) — Finally, 



as for ( fT3] ), if we suppose < with i < j, then we 
have y^-'F^^' € y(u,iV- 1) and degij/^'-^i^^'^) = 

which contradict the minimality of sf^\. □ 
Thus we have proved the theorem for an algorithm that is 
not a parallel version, i.e., the algorithm with direct calculation 
of by (O without wj^'' and w^''. To prove our parallel 
inverse-free BMS algorithm described in Section |llll we have 
to show further that d}^ is obtained by the coefficient of wjy ; 
we omit this procedure and refer to similar cases lfT4llfT6l of 
ordinary parallel BMS algorithm. 
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